Bilingual Insights into the Initial Lexicon

The Role of Cognates in Word Acquisition

Gonzalo Garcia-Castro

PhD Defence / Departament de Medicina i Ciències de la Vida

2024-11-03


The initial lexicon

Average 20-year-old knows ~42,000 lemmas: mental lexicon

Lexical representations
Phonological, conceptual, grammatical information of known words
Form-meaning association

First lexical representations at 6-9 months

Normative trajectories of lexical development

Vocabulary size norms for 51,800 monolingual children learning 35 distinct languages (Wordbank, Frank et al. 2017)

Bilinguals face additional challenges, but do not lag behind

  • Increased complexity in linguistic context (learning two codes)
  • Reduced linguistic input (split into two languages)
  • Increased referential ambiguity

Hoff et al. (2012): bilinguals acquire words at similar rates as monolinguals

  • 47 English-Spanish bilinguals
  • 56 English monolinguals in Florida

Lexical similarity modulates vocabulary acquisition in bilinguals

Floccia et al. (2018): CDI responses of 372 bilinguals (UK) learning English + additional language

Lexical similarity: average phonological similarity (Levenshtein similarity) between pairs of translations

English-Dutch (22.14%) > English-Mandarin (1.97%)


Higher lexical similarity, larger vocabulary size

Stronger effect in the additional language (e.g., Dutch, Mandarin)

Lexical similarity modulates vocabulary acquisition in bilinguals

Pairwise lexical similarity (average Levensthein similarity across translations in Floccia et al.)

A cognate facilitation in lexical acquisition?

Cognates: phonologically-similar translation equivalents

Cognate Non-cognate
[cat] /ˈgat-ˈga.to/ [dog] /ˈgos-ˈpe.ro/

Some evidence that cognates acquired earlier than non-cognates (Mitchell, Tsui, and Byers-Heinlein 2023; Bosch and Ramon-Casas 2014)


What mechanisms support a cognate facilitation during word acquisition?

Language non-selective lexical access

Activation spreads across non-selected representations in both languages, through phonological and conceptual links in adults (e.g., Costa, Caramazza, and Sebastian-Galles 2000) and infants (e.g., Von Holzen and Mani 2012; Singh 2014)

The present dissertation

Study 1

  1. Provide a mechanistic account for the cognateness facilitation
  2. Test predictions of the model

Submitted, under review

Study 2

  1. Test core assumption of the model: language non-selectivity in the initial lexicon

In preparation

Study 1

Cognate beginnings to lexical acquisition: the AMBLA model

Accumulator Model of Bilingual Lexical Acquisition (AMBLA)

  1. Accumulation of information about form-meaning mappings:
  • Provided by learning instances: exposure to a word-form that results in the accumulation of information about its meaning
  1. Age of acquisition: the infant accumulates a threshold amount of learning instances for a word-form


\[ \begin{aligned} \definecolor{myred}{RGB}{ 168, 0, 53 } \definecolor{myblue}{RGB}{ 0, 64, 168 } \definecolor{mygreen}{RGB}{0, 168, 87} \definecolor{grey}{RGB}{128, 128, 128} \textbf{For participant } &i \textbf{ and word-form } j \text{ (translation of } j'): \\ {\color{mygreen}\text{Age of Acquisition}_{ij}} &= \{\text{Age}_i \mid {\color{myred}\text{Learning instances}_{ij}} = {\color{myblue}\text{Threshold}} \}\\ {\color{myred}\text{Learning instances}_{ij}} &= \text{Age}_i \cdot \text{Freq}_j \\ \end{aligned} \]

AMBLA: monolingual word acquisition

Parameters:

Catalan monolingual child

  • /’gos/ (Catalan), 100%

\[ \begin{aligned} \text{Threshold} = 300 \\ \text{Freq}_j \sim \text{Poisson}(\lambda = 50) \end{aligned} \]

AMBLA: monolingual word acquisition

Parameters:

Catalan monolingual child

  • /’gos/ (Catalan), 100%

\[ \begin{aligned} \text{Threshold} = 300 \\ \text{Freq}_j \sim \text{Poisson}(\lambda = 50) \end{aligned} \]

AMBLA: monolingual word acquisition

Parameters:

Catalan monolingual child

  • /’gos/ (Catalan), 100%

\[ \begin{aligned} \text{Threshold} = 300 \\ \text{Freq}_j \sim \text{Poisson}(\lambda = 50) \end{aligned} \]

AMBLA: bilingual word acquisition

  1. Linguistic input divided into two languages: Catalan 60%, Spanish 40%

Exposure: proportion of time exposed to the language of \(j\) word

Accumulation of learning instances, a function of Exposure and Frequency.

\[ \begin{aligned} \textbf{For participant } &i \textbf{ and word-form } j \text{ (translation of } j'): \\ \text{Age of Acquisition}_{ij} &= \{\text{Age}_i \mid \text{Learning instances}_{ij} = \text{Threshold} \}\\ \text{Learning instances}_{ij} &= \text{Age}_i \cdot \text{Freq}_j \cdot {\color{myred}\text{Exposure}_{ij}}\\ \end{aligned} \]

AMBLA: bilingual word acquisition

Parameters:

Catalan monolingual child

  • /’gos/ (Catalan), 100%

Catalan/Spanish bilingual child

  • /’gos/ (Catalan), 60%

  • /’pe.ro/ (Spanish), 40%

\[ \begin{aligned} \text{Threshold} = 300 \\ \text{Freq}_j \sim \text{Poisson}(\lambda = 50) \end{aligned} \]

AMBLA: bilingual word acquisition

Catalan monolingual child

  • /’gos/ (Catalan), 100%

Catalan/Spanish bilingual child

  • /’gos/ (Catalan), 60%

  • /’pe.ro/ (Spanish), 40%

\[ \begin{aligned} \text{Threshold} = 300 \\ \text{Freq}_j \sim \text{Poisson}(\lambda = 50) \end{aligned} \]

AMBLA: cognate facilitation

  1. Words may accumulate additional learning instances from the co-activation of their (phonologically similar) translation equivalent

Degree proportional to their phonological similarity (Cognateness)

\[ \begin{aligned} \textbf{For participant } &i \textbf{ and word-form } j \text{ (translation of } j'): \\ \text{Age of Acquisition}_{ij} &= \{\text{Age}_i \mid \text{Learning instances}_{ij} = \text{Threshold} \}\\ \text{Learning instances}_{ij} &= \text{Age}_i \cdot \text{Freq}_j \cdot \text{Exposure}_{ij} + \\ &({\color{myred}\text{Learning instances}_{ij'} \cdot {\text{Cognateness}}_{j}})\\ \textbf{where:} \\ {\color{myred}\text{Cognateness}_{j,j'}}&{\color{myred} = \text{Levenshtein}(j, j')} \end{aligned} \]

AMBLA: cognate facilitation

Catalan monolingual child

  • /’gos/ (Catalan), 100%

Catalan/Spanish bilingual child

  • /’gos/ (Catalan), 60%

  • /’pe.ro/ (Spanish), 40%

\[ \begin{aligned} \text{Threshold} = 300 \\ \text{Freq}_j \sim \text{Poisson}(\lambda = 50) \\ \text{Cognateness}_{j,j'} = 0.75 \end{aligned} \]

AMBLA: cognate facilitation

Catalan monolingual child:

  • /’gos/ (Catalan), 100%

Catalan/Spanish bilingual child:

  • /’gos/ (Catalan), 60%

  • /’pe.ro/ (Spanish), 40%

\[ \begin{aligned} \text{Threshold} = 300 \\ \text{Freq}_j \sim \text{Poisson}(\lambda = 50) \\ \text{Cognateness}_{j,j'} = 0.75 \end{aligned} \]

Predictions

  1. Cognates acquired earlier than non-cognates
  2. Cognateness facilitation stronger in the lower-exposure language

Dataset

  • Barcelona Vocabulary Questionnaire (BVQ): 302 Catalan-Spanish noun translation equivalents
  • 436 administrations, 366 children (12-32 months)
  • Lang. exposure: Catalan and Spanish (\(\leq\) 10% 3rd language)
  • 138,078 item responses (No, Understands, Understands and Says)

Results: comprehension

Ordinal, multilevel (Bayesian) regression model

\(p(\text{Comprehension}, \text{Production}) \sim \text{Exposure}_{ij} \cdot \text{Cognateness}_j\)

Results: production

Ordinal, multilevel (Bayesian) regression model

\(p(\text{Comprehension}, \text{Production}) \sim \text{Exposure}_{ij} \cdot \text{Cognateness}_j\)

Discussion

Earlier acquisition for cognates vs. non-cognates

Cognate facilitation moderated by exposure

Only words from the lower exposure benefit from cognateness

Cognateness as a candidate mechanism underlying Floccia et al.’s results

Cross-language facilitation via co-activation of phonologically similar translation equivalents

Is language-non selectivity already present in the initial lexicon?

Study 2

Developmental trajectories of bilingual spoken word recognition

Language non-selectivity in the initial lexicon

Some evidence in infants and children (e.g., Von Holzen and Mani 2012; Singh 2014)

Methodological pitfalls: “bilingual” task

Implicit naming task

(Mani and Plunkett 2010, 2011)

English monolinguals

Study 2: design

Extending the task to test cross-language priming in bilinguals.

Change in order of trial timecourse:


Auditory label before target-distractor images

Length of Catalan and Spanish words

Temporal proximity of prime and target labels

Predictions

Predictions and dataset

Exp. 1: monolinguals

Replicate within-language phonological interference from Mani and Plunkett (proof of concept)

Exp. 2: monolinguals and bilinguals

If language non-selectivity, stronger interference in cognate vs. non-cognate trials

79 English monolinguals

89 sessions

77 Catalan/Spanish monolinguals

107 sessions

78 Catalan/Spanish bilinguals

133 sessions

Experiment 1: results, Bayesian GAMMs

Experiment 2: results, Bayesian GAMMs

Discussion

Successful word recognition across ages and language profiles

No evidence of priming effects, within or across languages

Most likely due to design caveats

General discussion

Summary

Cognateness facilitates word acquisition in the lower-exposure language

Candidate mechanism behind bilingual vocabulary growth

AMBLA: cross-language accumulation of learning instances

Language non-selectivity in the initial lexicon: pending severe testing

Contributions

Barcelona Vocabulary Questionnaire (BVQ) + package +

Levenshtein distance as a valid measure of word-level effects of pghonological similarity

{jtracer} package

AMBLA: natural extension of the Standard Model of language acquisition? (Kachergis, Marchman, and Frank 2022)

Future steps

Backward Semantic Inhibition

Does cognateness impact the acquisition of other grammatical categories (e.g., verbs, adjectives)

Conclusions

  • Insights into the developing bilingual lexicon: cognateness
  • Evidence in favour of language non-selectivity as the underlying mechanisms behind the cognate facilitation
  • Important consequences for bilingual vocabulary growth

Thanks!

Appendix

Introduction: bilingualism

Classification of participants into monolinguals an bilinguals

Introduction: cognate contents in the aggregated vocabulary

Cognate contents in the aggregated vocabulary

Study 1: posterior regression coefficients

Aggregated vocabularies might conceal facilitation effects

Study 1: MCMC convergence (\(\hat{R}\))

MCMC convergence for the model in Study 1

Study 2: predictions

  • Successful spoken word recognition across groups
  • If language non-selectivity, stronger interference in cognate vs. non-cognate trials

Study 2: vocabulary size

Study 2 participant receptive vocabulary sizes across ages and language profiles

Study 2: model convergence (Exp. 1)

MCMC convergence for model in Study 1 (Exp. 1)

Study 2: model convergence (Exp. 2)

MCMC convergence for model in Study 2 (Exp. 1)

References

Bergelson, Elika, and Daniel Swingley. 2012. “At 69 Months, Human Infants Know the Meanings of Many Common Nouns.” Proceedings of the National Academy of Sciences 109 (9): 3253–58. https://doi.org/10.1073/pnas.1113380109.
Bosch, Laura, and Marta Ramon-Casas. 2014. “First Translation Equivalents in Bilingual Toddlers’ Expressive Vocabulary: Does Form Similarity Matter?” International Journal of Behavioral Development 38 (4): 317–22. https://doi.org/10.1177/0165025414532559.
Costa, Albert, Alfonso Caramazza, and Nuria Sebastian-Galles. 2000. “The Cognate Facilitation Effect: Implications for Models of Lexical Access.” Journal of Experimental Psychology: Learning, Memory, and Cognition 26 (5): 1283. https://doi.org/10.1037/0278-7393.26.5.1283.
Fenson, Larry, Philip S Dale, J Steven Reznick, Elizabeth Bates, Donna J Thal, Stephen J Pethick, Michael Tomasello, Carolyn B Mervis, and Joan Stiles. 1994. “Variability in Early Communicative Development.” Monographs of the Society for Research in Child Development 59 (5): 1–185. https://doi.org/10.2307/1166093.
Frank, Michael C., Mika Braginsky, Daniel Yurovsky, and Virginia A. Marchman. 2017. “Wordbank: An Open Repository for Developmental Vocabulary Data.” Journal of Child Language 44 (3): 677–94. https://doi.org/10.1017/s0305000916000209.
Kachergis, George, Virginia A. Marchman, and Michael C. Frank. 2022. “Toward a Standard Model’ of Early Language Learning.” Current Directions in Psychological Science 31 (1): 20–27. https://doi.org/10.1177/09637214211057836.
Mani, Nivedita, and Kim Plunkett. 2010. “In the Infant’s Mind’s Ear: Evidence for Implicit Naming in 18-Month-Olds.” Psychological Science 21 (7): 908–13. https://doi.org/10.1177/0956797610373371.
———. 2011. “Phonological Priming and Cohort Effects in Toddlers.” Cognition 121 (2): 196–206. https://doi.org/10.1016/j.cognition.2011.06.013.
Mitchell, Lori, Rachel Ka-Ying Tsui, and Krista Byers-Heinlein. 2023. “Cognates Are Advantaged over Non-Cognates in Early Bilingual Expressive Vocabulary Development.” Journal of Child Language, 1–20.
Singh, Leher. 2014. “One World, Two Languages: Cross-Language Semantic Priming in Bilingual Toddlers.” Child Development 85 (2): 755–66. https://doi.org/10.1111/cdev.12133.
Tincoff, Ruth, and Peter W Jusczyk. 1999. “Some Beginnings of Word Comprehension in 6-Month-Olds.” Psychological Science 10 (2): 172–75. https://doi.org/10.1111/1467-9280.00127.
Von Holzen, Katie, and Nivedita Mani. 2012. “Language Nonselective Lexical Access in Bilingual Toddlers.” Journal of Experimental Child Psychology 113 (4): 569–86. https://doi.org/10.1016/j.jecp.2012.08.001.